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VISUALIZATION AND SELF-ORGANIZATION OF MULTIDIMENSIONAL DATA 
THROUGH EQUALIZED ORTHOGONAL MAPPING 

Rplatpd A pplications 

V s>.\ The present application is a Continuation-ln-Part (CIP) of co-pending U.S. 

^Application Seria>No. 08/536,059 filed September 29, 1995. 

Background of the Invention 
This application pertains to the art of artificial intelligence, and more 
5 q particularly to a system for organizing a large body of pattern data so as to organize it to 
j B facilitate understanding of features. 

1 J The subject system has particular application to analysis of acquired, empirical 

!o data, such as chemical characteristic information, and will be described with particular 
b reference thereto. However, it will be appreciated that the subject system is suitably adapted 
lc|H to analysis of any set of related data so as to allow for visualization and understanding of the 

52 constituent elements thereof. 

It is difficult to make sense out of a large body of multi-featured pattern data. 
Actually the body of data need not be large; even a set of 400 patterns each of six features 
would be quite difficult to "understand." A concept of self-organization has to do with that 
15 type of situation and can be understood in terms of two main approaches to thai task. In one 
case, an endeavor is directed to discovering how the data are distributed in pattern space, with 
the intent of describing large bodies of patterns more simply in terms of multi-dimensional 
clusters or in terms of some other distribution, as appropriate. This is a dominant concern 
underlying the Adaptive Resonance Theory (ART) and other cluster analysis approaches. 



,„ a remaining case, effort is devo.ed .0 dimension reduction. The 
corresponding idea ,s tha, the ortgina, representation, having a ,arge number of features, is 
r edu„da„, ,n its representation, with several features being near repetitions of each other, .n 
SU ch a situation, a principa, feature extraction which is accompanied by dimension reduction 
may simp.ify the description of each and a,l the patterns. Clustering is suitably achieved 
subsequent* in the reduced dimension space. The Karhunen-Loeve (K-L) transform, neurai- 
ne , impiementations of the K-L transform, and the au.o-associative mapping approach are aU 

. .„,,i„ck iPCAi feature extraction and dimension reduction, 
directed to principal component analysis (r't-A). teaiure " 

,n ac.ual.ty .he ,wo streams of activity are not entirely independent. For 
example the ART approach has a strong "winner^e-a,," mechanism in forming its clusters. 
U .s sui,ably viewed as "ex.rac.ing" .he principa. prototype,, and forming a reduced 
_aescrip,io_n in .errns of .hes^ew^cipa. pro.otypes. The feature map approach aims at 
collecting similar pauerns^ogetner through ^ateraUxci.ation-inhibition so tha, patterns with 
simi ,ar features are mapped into contiguous regions in a reduced dtmension feature map. Tha, 
me ,hod c,us,ers and reduces dimensions. The common aim is ,0 ,e. data self organize into a 

simpler representation. 

A new approach ,0 Ut.s same task of self-organization is described in herein. 

The idea is that data be subjec.ed .0 a nonlinear mapping from Ute origina! representation ,0 

one of reduced dimensions. Such mapping is suitab>y implemented with a multilayer 

feedforward neura, net. Ne, parameters are learned in an unsupervised manner base, on the 

principle of conservation of the total variance in the description of the patterns. 



The concept of dimension reduction is somewhat strange in itself. It allows for 
a reduced-dimension description of a body of pattern data to be representative of the original 
body of data. The corresponding answer is known for the linear case, but is more difficult to 

detail in the general nonlinear case. 

A start of the evolution leading to the subject invention may be marked by 
noting the concept of principal component analysis (PCA) based on the Karhunen-Loeve (K-L) 
transform. Eigenvectors of a data co-variance matrix provide a basis for an uncorrelated 
representation of associated data. Principal components are those which have larger 
eigenvalues, namely those features (in transformed representation) which vary greatly from 
pattern to pattern. If only a few eigenvalues are large, then a reduced dimension 
representation is suitably fashioned in terms of those few corresponding eigenvectors, and 
nearly all of the information in the data would still be retained. That utilization of the 
Karhunen-Loeve transform for PCA purposes has been found to be valuable in dealing with 
many non-trivial problems. But in pattern recognition, it has a failing insofar as what is 
retained is not necessarily that which helps interclass discrimination. 

Subsequent and somewhat related developments sought to link the ideas of 
PCA, K-L transform and linear neural networks. Such efforts sought to accomplish a linear 
K-L transform through neural-net computing, with fully-connected multilayer feedforward nets 
with the backpropagation algorithm for learning the weights, or with use of a Generalized 
Hebbian Learning algorithm. In this system, given a correct objective function, weights for 
the linear links to any of the hidden layer nodes may be noted to be the components of an 



eigenvector of the co-variance matrix. Earlier works also described how principal components 
may be found sequentially, and how that approach may avoid a tedious task of evaluating all 
the elements of a possibly very large co-variance matrix. 

The earlier works begged the question of what might be achieved if the neurons 
in the networks were allowed to also be nonlinear. Other efforts sought to address that 
question. In one case, the original data pattern vectors are subjected to many layers of 
transformation in a multilayer feedforward net, but one with nonlinear internal layer nodes. 
An output layer of such a net has the same number of nodes as the input layer and an objective 
is to train the net so that the output layer can reproduce the input for all inputs. This provide, 
a so-called auto-associative learning configuration. In addition, one of the internal layers 
serves as a bottle-neck layer, having possibly a drastically reduced number of nodes. Now, 
since the outputs from that reduced number of nodes can closely regenerate the input, in all 
cases, the nodes in the bottle-neck layer might be considered to be a set of principal 
components. That may prove to be an acceptable viewpoint, except for the fact that the 
solutions attained in such learning are not unique and differ radically depending on initial 
conditions and the order in which the data patterns are presented in the learning phase. 
Although the results are interesting, there is no unique set of principal components. 

In another earlier feature map approach, dimension reduction is attained in yet 
another manner. A reduced-dimension space is suitably defined as nvo dimensional. The 
reduced-dimension space is then spanned by a grid of points and a pattern vector is attached to 
each of those grid points. These pattern vectors are chosen random* from the same pattern 



space as that of the problem. Then the pattern vectors of the problem are allocated to the grid 
points of the reduced-dimension space on the basis of similarity to the reference vector 
attached to the grid. This leads to a biology inspired aspect of the procedure, namely that of 
lateral excitation-inhibition. When a pattern vector is allocated to a grid point, at first it would 
5 be essentially be at random, because of that grid point happening to have a reference vector 
most similar to the pattern vector. But once that allocation is made, the reference vector is 
modified to be even more like that of the input pattern vector and furthermore, all the 
reference vectors of the laterally close grid points are modified to be more similar to that input 
!□ pattern also. In this way, matters are soon no longer left to chance; patterns which are similar 
10 ! fi in the original pattern space are in effect collected together in reduced dimension space. 
!S Depending on chance, sometimes two or more rather disparate zones can be built up for 
Id patterns which could have been relegated to contiguous regions if things had progressed 
b slightly differently. On the other hand, results of that nature may not be detrimental to the 
!H objectives of the computational task. 
15 g The ART approach to self-organization of data can be mentioned in this context 

because the MAX-NET implements a winner-take-all approach in building up clusters and 
there is indeed lateral inhibition even though it is not related to the distance between cluster 
centers in cluster space. There is data compression but no dimension reduction. 

According to a first aspect of the present invention, the above-noted problems 
20 and others, are addressed to provide a system for autonomous reduction of pattern dimension 
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data to a 
system. 



largely unambiguous, two-dimensional representation using an extremely efficient 



It is appreciated that many tasks in engineering involve the process of extracting 
useful information from unorganized raw data. However, as discussed above, it is a challenging 
task to make sense out of a large set of multidimensional data. The difficulty mainly lies in the 
fact that the inter-pattern relationship cannot be readily grasped. Visual display has been one of 
the most useful tools to guide this kind of 'analysis. Unfortunately, it is not directly possible to 
realize in a meaningful manner for dimensions higher than three. 

As indicated above, the complexity of raw data must be reduced in order to 
understand the meaning thereof. Generally, two major categories of approaches are used to 
tackle this problem. In the first category, information such as the Euclidean distance between 
data patterns is used to infer how the data patterns are distributed in the multidimensional space, 
using methods such as clustering or Kohonen's self-organizing map (SOM). The emphasis of 
these methods is to describe large amounts of data patterns more concisely with cluster attributes 

or some other distributions. 

The second category of approaches emphasizes the reduction of dimensions, i.e., 
the reduction of the number of features necessary to describe each and all of the data patterns. 
The idea is that perhaps the dimensions of the onginal data space are not all independent of each 
other, i.e. these dimensions may be some complicated functions of just a few independent 
inherent dimensions albeit not necessarily among those known. Accordingly, the objective is to 
use this reduced-dimension space to describe the patterns. Some methods belonging to this 
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category are linear principal component analysis (PCA) through .he Karhunen-Loeve (K-L) 
transform, neura.-ne, implementations of PCA, ,h« autoassoci.tive mapping approach and the 
non-linear variance-conserving (NLVC, mapping. These methods generally .ry .0 map the high- 
dimensiona. space ,0 ,he lower one. There are a,so methods ,o do ,he reverse. An example is 
genera,ive topographic mappmg (GTM). described in a paper by CM. Bishop, M. Svensen and 
C. K. 1. Williams emilled -GTM: The generative topographic mapping." 

However tt should be appreciated that the two categories discussed above are not 
entirely distinct. C.ustermg could be used subsequently in the the reduced-dimension space to 
further help the comprehension of the data. The SOM approach collects similar patterns together 
thro „gh lateral excitat.on-tnhibitton ,n a reduced-d.mension feature map. Therefore, SOM both 

clusters and reduces dimension. 

Except for linear PCA methods which are Itmi.ed by their linearity narure already, 

other methods menttoned above e.ther map the high dimensional data to discrete grid points in 

, ne lowe, dimensional space or the appearance of the lower dimensional map closely depends on 

the initial (usually random) choice of mapping parameters or both. 

The grrd point maps are usually useful in applications such as classification and 
encoding where exact relative positions of the data points are not of critical importance as long as 
Cose potnts in original data space remain close in the map. For example, the GTM approach 
stans with a grid of po.n.s in the lower dimension and a set of non-linear basis functions, which 
were assumed to be radially sy^metnc Gaussians evenly dismbuted in the lower dimensional 
space. A mapping of the grid points from the lower dimension to the higher dimension is 



assumed to be of a linear weighted sum of , hose basis functions. Then, .he probab.ttty oenstty o, 
, ne higher dimension is proposed to be formed by radia.ly symmetric Gaussians centered on 
,hose grid points jus, mapped ,o ,he higher dimension. In Bishop's works on GTM, i, is assumed 
,ha, .he Bayes- rule can be used to invert .he mapping and .o es.ima.c .he responsibility of each 
gnd poin, .0 ,he dis.nbu.ion in ,he higher dimensional space. The likelihood of da.a poin.s in .he 
h,gher dimension can ,hen be re.es.imated with .he responsibility information. By optimizing 
this resul, ,o give ,he dis.ribu.ion of .he known da.a poin.s in .he higher dimension, .he i.era.ive 
learning procedure of.he weigh, parameters of .he mapping and wid.h parameters of .he 
Gaussians forming .he densi.y dis.ribu.ion ,s ob.ained. A lower dimensional map of .he da.a 
poin,s for viewing can be generated by the responsibili.y information upon convergence of the 
learning. Provided .ha. .he mapping function is smoo.h and continuous, adjacen, poin.s in .he 
lower dimension wil, map ,o adjacen. points in .he higher d.mension. But the reverse is no. 
necessarily .rue since for a given da.a poin, in .he higher dimension ,he responsibili.ies of .he 
Gaussians on grid poin.s may be mul.i-modal due .o the shape of the manifold generated by the 
mapping func.ion. Instead of being the responsibility of one or a few adjacen, grid points, the 
data poin, may be ,he responsibility of several d.s,an. grid poims on ,he lower dimensional map. 
Ahhough such a map may -ill be useful for some classified and similar purposes, i, would be 
inappropriate ,o use .his kind of a map for op.imiza.ion since i. would be 4iffic.lt to interpret 
interpolation between grid points on such a map. Ote grid point maps such as .hose obtained 
by SOM, may also have .he same type of difficulty in interpreting interpolation between grid 



points. 



Although a non-linear PCA type mapping such as .he autoassociative mapping or 
NLVC mapping do no, have .he interpolation difficulty, ,he appearance of ,he lower dimensional 
map is usually dependent on ,he choice of initial parameters. This dependence is described 
below using NLVC mapping as an example. To obrain a map with good distribution of data 
points, a number of trials may be necessary until a satisfactory one can be found. 

According to a second aspect of the present invention, the foregoing complexity- 
reduction problems, as well as others, are addressed. In this regard, an approach referred to as 
Equalized Orthogonal Mapping (EOM) is described herein. This approach falls into the second 
category and is developed with considerations on the interpolation capability and reduction of 

dependence on initial parameters in mind. 

The EOM approach can be implemented through a backpropagation learning 
process. The detailed equations for this procedure are derived and described below. Examples 
of use of EOM in obtaining reduced dimension maps and comparisons with the SOM and NLVC 
approaches are also described. Moreover, results are given for two situatton, In one case the 
input data is seemingly of 5 d.mensions bu, is actually 2-D m nature. In another case, the 
mapping is applied to a body of gasoline blending data and potential use of the resulting map for 

optimization is demonstrated. 

It should be appreciated that while the following description of the present 
invention is directed to mapping in cases where the reduced-dimens.on representation is of 2-D, 
so that the representation can be easily visualized, the present invention is suitable for other 
dimensions as well. 
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Summary nf the In vention 
]n accordance with the present invention, there is provided a system for 
organization of multi-dimensiona. pattern data into a dimensional representation tha, inCudes a 
neural network. The neural network is comprised of .ayers of neura, nodes. These .ayers 
inciude an input layer and an output layer and one or more hidden layers disposed 
therebetween. The output layer is comprised of first and second non-linear nodes that share a 
common interna, network representation. Multi-dimensiona. pattern data are received into the 
input layer of the neural network. The system generates an output signal at the output layer of 
,he neura, network, which output signal corresponds to a received multi-dimensiona! pattern. 

,n accordance with another aspect of the present invention, there is provided an 
organization of multidimensional pattern data tnto a two-dimensional representation ,0 further 
include a system for completing supervised learning of weights of the neura, network. 

In accordance with yet a further aspect of the present invention, there is 
provided a method for orgamzation of multi-dimensiona, pattern data into a two-dimensional 
representation which includes the steps of receiving multi-dimensional pattern data into a 
neural network and omputting, via the neural network which has been trained by 
backpropagation, an ourpu, signal. The ourpu, signa, is generated by an output layer which 
consists of fu-s. and second non-linear nodes which share a common internal network 



representation. 
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,n accordance with a more limited aspect of the method of the subject invention, 
lh ere is provided the steps of completing a training of the neural network to accomplish the 

foregoing. 

According lo yet another aspect of the present invention, there is provided a new 
approach to dimension-reduction mapping of multidimensiona, pattern data. This approach 
applies the ntapping through a conventional single-hidden-layer feed-forward neural network 
with non-linear neurons, bu, with a different objective function which equalizes and 
orthogonalizes the lower dimensional outputs by reducing the covanance matrix of .he outputs to 
the form ofa diagonal matrix, or a constant times the identity matrix, rather than specifying what 
the outputs should be as in a conventional „e, used in function estimation. Since no attribute 
information is used in this mapping, it is essentially an unsupervised learning procedure. A 
detailed backpropagation learning procedure of such a mapping is described herein. 

In accordance with another aspect of the present invention, there is provided a 
method for visualizing large bodies of complex multidimensional data in a relatively 
"topological* correct" low-dimension approximation, ,0 reduce the randomness associated wtth 
other methods of similar purposes and to keep the mapping computationally efficient a, the same 
time. Examples of the use of this approach in obtaining meaningful 2-D maps and comparisons 
with self-organizing mapping (SOM) and no„-.,near vanance-conserving (NLVQ mapping 

approaches are described herein. 

An advantage of the present invention is the provision of a neural network 
which allows for automated creation of a two-dimensional representation of pattern data. 
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Still another advantage of the present invention is the creation of a two- 
dimensional representation of pattern da, which iso^tes ,e,a,ed characteristics disposed withtn 
the pattern data in an unambiguous fashion so as to allow to human visualization and 

cataloging of fearures. 

Still another advantage of the present invention is the provision of a neural 
network for organization of pattern data efficiently so as to allow for real-time computation 
with conventional processing hardware. 

Yet another advantage of the present invention is the provision of a system 
which reduces the dimensions of pattern data by controlling the variance. 

Yet another advantage of the present invention is the provision of a system 
which reduces the dimenstons of pattern data by reducing the covariance matrix of the outputs 
to the form of a diagonal matrix or a constant times the identity matrix. 

Further advantages and benefits of the invention will become apparent to those 
skilled in the art upon a reading and understanding of the following detailed description. 

Rripf Descriptinn of the Drawings 
The invention may take physical form in certain parts and arrangements of 
parts, a preferred embodiment and method of which will be described in detail in this 
specification and illustrated in the accompanying drawings which form a part hereof, and 
wherein: 
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Fig. 1 illustrates an earlier, auto-associative network relative to the subject non- 
Unear variance-conserving (NLVC) mapping with dimension reduction employed by the 
preferred embodiment of the subject invention; 

Fig. 2 illustrates a three-dimensional graph of two periods of a helix with 81 

points plotted; 

Fig. 3 illustrates a two-dimensional representation of a Karhunen-Loeve (K-L) 

transformation of a helix; 

Fig. 4 illustrates a two-dimensional representation of a helix with auto- 
associative mapping; 

Fig. 5 illustrates a two-dimensional representation of a helix with non-linear 

variance-conserving mapping; 

Fig. 6 illustrates a two-dimensional representation of gasoline blending data 

with non-linear variance-conserving mapping; 

Fig. 7 illustrates a sensitivity of positions of blends with respect to the content 

level of isopenetane, cat cracked, and reformate; 

Fig. 8 illustrates a two-dimensional representation of sensor data with non- 
linear variance-conserving mapping; 

Fig. 9 illustrates a trace of successive sensor profiles monitoring from ^non- 
fault" to "fault;" 

Fig. 10 provides a two-dimensional representation of band gap data with non- 
linear variance conservation mapping; 
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band gaps; 



(EOM); 



Fig. 1 1 illustrates a table of benchmark gasoline blending data; 

Fig. 12 illustrates a table of time-dependent sensor data profiles; 

Fig. 13 provides a table of semi-conductor crystal structure parameters and 

Fig. 14 illustrates a network structure for Equalized Orthogonal Mapping 
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Fig. 15 illustrates a theoretical 2-dimensional map for an equation providing a 
solution for a 5-dimensional function with 2 inherent dimensions; 

Figs. 16A - 16D illustrate reduced dimension maps for a 5-D function with 2 
inherent dimensions, obtained by Self-Organizing Mapping (SOM); 

Figs. 17A - 17D illustrate reduced dimension maps for the same 5-D function, 
obtained by Non-Linear Variance Conserving (NLVC) mapping; 

Fig. ISA - 18D illustrate reduced dimension maps for the same 5-D function, 

obtained by Equalized Orthogonal Mapping (EOM); 

Figs. 19A - 19D illustrate reduced dimension maps for the gasoline blending 

data shown in Fig. 1 1 , as obtained by SOM; 

Figs. 20A - 20D illustrate reduced dimension maps for the gasoline blending 

data shown in Fig. 1 1 , as obtained by NLVC; 

Fig. 21 A - 21D illustrate reduced dimension maps for the gasoline blending data 

shown in Fig. 1 1 , as obtained by EOM; 
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Fig. 22A illustrates a reduced dimension map of six patterns obtained by EOM; 
Fig. 22B illustrates model values of the region shown in Fig. 22A. 



Detailed Description of the Preferred Embodiment 
Referring now to the drawings wherein the showings are for the purposes of 
illustrating a preferred embodiment of the invention only and not for purposes of limiting 
same, Fig. 1 illustrates, in portion (a) an auto-associated approach while portion (b) illustrates 
non-linear variance-conserving mapping with dimension reduction as provided by the preferred 
embodiment. The two will be described independently. The side-by-side illustration for the 
advantages and distinctions in architecture between the approach (a) of earlier attempts and the 
architecture of the subject, preferred embodiment. 

In the portion (a), a neural network 10 has an input layer 12 and an output layer 
14. In the illustration of portion (a), the input layer 12 is comprised of five nodes, 20, 22, 24, 
26, and 28, each of which has an associated input thereto. The output layer 14 is also 
illustrated as being comprised of five nodes, 30, 32, 34, 36, and 38. The number of 
illustrated nodes in each of the input layer 12 and the output layer 14 is not limited to five. 
Any plurality may be chosen for such layer and these values need not be identical. Specific 
numbers of nodes are highly application specific. An arbitrary internal layer 40 disposed 
within the neural network 10 is narrowed to two internal nodes 42 and 44. From the 
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„, u s,ra,ion, .. wm be appreciated ,h3, there is therefore a ton** or necking of ... panern 
data chat is provided by the illustrated layer 40. 

Turning next to portion (b) of .he preferred embodiment neural ne.work 100 
includes an inpu, .ayer ,02. .npu, layer 102. for illustrative purposes only, is noted ,0 

wil, be appreciated tha. .he disclosed archi.ecn.re of .he neural network 1 CO includes no 
j„,ernal ponion ana.ogous .0 that in.ernal layer 40 provided by portion (a). 

The archi.ecure of (b) provides an outpu. layer 130 comprised of f.rs. and 
second neurons .32 and 134. Two neurons are chosen a, ,he ourpu. layer ,30 by way of 
example of .he preferred embodiment. Th.s choice a„o„s for two-dimensiona, realization and 
visualization of .he pattern data. „ will be appreciated from the description below ,ha, the 
output ,ayer o, the preferred embodiment is comprised entire.y of non,in=ar nodes sharing a 
conunon interna, network representation. As with the descr.ption of (a,, it is to be appreciated 
Iha , a number of nodes forming inpu. ,ayer >02 is chosen pursuant to the particu.ar a PP ,ica„on 

and hardware choices. 

Next, a dimension reduction from 5 to 2 is considered for illustration purposes. 

,„ the auto-associative approach, a „e, such as tha, shown in Fig. 1(a) would be used and i, 
wou ,d be .rained so tha, the ne, serves as the identity operator. The ou.pu, vecor of .he oe. 
w ou,d always be nearly ,he same as the inpu. panern vecor. H the intrinsic dimensionality of 
th e data is more then 2-D. then the net will be appreciably different from an identity operator. 
The net is trained in what is sometimes called a self-supervised manner. 
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The net to be used in the present new approach is less complex. The intent is to 
conserve as much of the data information as possible with a 2-D representation instead of five. 
The net for computing the 2-D representation is simply that illustrated in Fig. 1(b), to be 
trained according to the criterion that the variance in the 2-D representation be as nearly the 
5 same as that of the 5D representation. In this approach, it is essential that the outputs nodes 
be nonlinear and that they share a common internal network representation. 

Let {x p },p=l,2,...,P, be a set of data pattern vectors, wherein P is defined as a 
positive integer, and wherein the set of data pattern vectors has a total variance given by: 

13 (Equation 1) 

10 H .where: dimension S = 5 originally, 

jfj the < > notation denotes the average or mean taken over the set of input data 

Q pattern vectors for each indicated component (i.e., <x t > denotes the mean value of of x ip 
evaluated over the set of data pattern vectors), and 

x ip is the ith component of x p , the pth member of a set of data pattern vectors. 
15 It should be understood that a "measure" of the total variance is a linear or non-linear function 
of the total variance. 
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The net shown in Fig. 1(b) is now .Mined so <ha. .he variance calola.ed in .he 
reduced dimension represen.a.ion, for .he same se, of da.a panern vecors, is as c.osely .0 V as 
possible. 

Using the Backpropagation algorithm for training, the expressions for 
progressively changing the weights for the output nodes are as usual: 

Aw ..Kl/^'E Aw^.=(1/P)]C r\b pk O pJ 

(Equation 2) 

where all the symbols have their usual conventional meanings. In this respect, O pj is the 
output signal from the jth node in the layer preceeding the output layer due to the pth data 
pattern, t, is a constant of suitable value chosen to provide efficient convergence but to avoid 
oscillation, and 6 pk is a value proportional to the contribution to the error E by the outputs of 
the kth node of the output layer for the pth input data pattern (i.e., a measure of the sensitivity 
of the variance). 

The difference between the present case and the standard supervised learning 
task of neural-net computing comes in the expression for 6 pk which in this case (i.e., 
sigmoidal) is given by: 

6 [K-(l/P)EE(0,„-<0 >) : 1 (O pt -<O t >)O pt (i-O pk ) 

(Equation 3) 
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]n equation (3), V is the variance calculated for the training set of input data 
patterns, and the double sum within the rectangular brackets yields the variance for the output, 
reduced-dimension, representation of the same data. The effect of the learning procedure is to 
try to decrease the delta values while minimizing the difference between the original and 
reduced-dimension variances as much as possible. 

From the illustration of Fig. 1, it will be recalled that the hidden nodes are 
displayed in area 150 of network 100. As before, for the hidden layer nodes, the expressions 
for progressively improving the weights for those nodes are: 




(Equation 4A) 



OR 




(Equation 4B) 



.where O pi is the output signal for the ithe node of the layer preceeding the jth layer of the pth 



input data pattern. 
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It should be appreciated that a "hidden layer" can be a non-linear functional 
transformation layer, such as practiced in the functional link, and radial basis function 
architectures. 

The data patterns may be regarded as vectors in pattern space and their 
5 components would vary in value depending on the coordinate system used to describe them, or 
equivalently the basis vectors used to span that space. 

It is a fact that the trace of the data co-variance matrix is invariant with respect 
to linear transformations of the basis vectors spanning the pattern space. The present approach 
13 allows the transformation to be nonlinear but nevertheless seeks to conserve the total variance. 
10 10 Some simplifications can be made in the learning procedure prescribed by 

ffi equations (3) and (4A, 4B). Since interest is in the relative positions of the various patterns, 
3 the mean values of each of the features in the original full-dimensional representation is of no 
=•3 significance. Those values can be set to zero in equation (3). This amounts to a bodily shift 
*M of the entire distribution of pattern vectors as one rigid body and no relative position 

15|2 information is lost. 

Similarly in equation (3), it is suitably decided ahead of time that the reduced- 
dimension distribution of pattern vectors will be one of zero mean. This removes a somewhat 
bothersome step of calculating an ever changing mean as learning proceeds. Setting the mean 
as a condition in the learning amounts to exerting a bodily shift of the relative distribution. 

20 Although equation (3) is retained as such, the variance constraint really amounts 

to a signal strength constraint. Learning consists of learning a single set of weights which will 
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map all pattern vecors in the same way, striving to make each feature value, in the reduced- 
dimension space, as Cose to zero as possible and yet conserve the total signal strength, or 

equivalently, the (zero mean) variance. 

Further theoretical investigations are underway but the present practice must be 
considered to be motivated and justified more by results rather than by theory. Some 
representative results are presented in this and subsequent sections of this discussion. 

The results for three different approaches to reduced-dimension self- 
organization are presented in this section for reduction of 3-D data to 2-D. Of course this is a 
strange action to pursue if the data are intrinsically 3-D. On the other hand, the intrinsic 
dimension is generally unsown in most cases and so this simple and well controlled exercise 
might provide some insight as to what happens when the dimensionality of the new 
representation is less than that of the intrinsic dimensionality. 

If points along a three dimensional helix are to be represented in a "self- 
organized" 2-D presentation, what would that presentation look like? In other words what 
informations are discarded and what are preserved? 

Data points along a helix are illustrated in Fig. 2. The 2-D self-organized K-L 
depiction of that data is shown in Fig. 3, that obtained with auto-associative mapping is shown 
in Fig. 4, and the 2-D representation obtained with this present approach is shown in Fig. 5. 

For this particular case at least, the reduced-dimension representation obtained 
with this present non-linear variance conservation (NLVC) method is less ambiguous than that 
obtained with the auto-associative method. 
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The present work provides dimension-reduction while conserving as much as 
possible information regarding the inter-pattern differences. The advantages and disadvantages 
of the K-L. autoassociative, and nonlinear variance-conserving methods are as follows. 

tv, p KT-T Transfo rm Method 

Advantages- Well understood theoretical basis. 

Disadvantages- Lengthy computation of co-variance matrix; linear constraint leads to loss of 
information when dimension-reduction is large. 

ThP Aiito-Assof iaiive Method 

Advantages- theoretical basis conceptually sound; nonlinear if desired. 
Disadvantages- long training times, easily overtrained to yield misleading mappings. 

The Nonlinear V^^^.rnnstraint Method 

Advantages- conceptually sound, computationally highly efficient, significant dimension- 
reduction without distortions. 

Disadvantages- additional theoretical investigations would be helpful in generalizing the 
approach and in revealing in what sense the mapping is "topological*" correct; computational 
results all suggest una, order is conserved in some non-trivial manner, but it is difficult ,o be 
more precise than that at this point. 



- 22 - 



in literature, there is a body of gasoline-blending data which various 
investigators have used to assess the efficiency of their clustering or classification procedures. 
An example of such data is exhibited in Table 1 (Fig. 1 1), where each gasoline blend is 
described in terms of the amounts of their five constants and also by its Research Octane 
Number. That body of five-dimensional data was mapped onto a two-dimensional space using 
the present NLVC approach. The resulting body of data is now easily displayed and viewed in 
two dimensions as shown in Fig. 6, with non-trivial gain in the understanding of what the data 
signify. 

Such a plot will suitably serve as a memory as well as classification rule 
formulation device, as indicated by the drawing of the lines in Fig. 6, lines which seem to 
separate the high-octane blends from the low octane ones. In addition, in such a plot, it is 
readily discerned that three of the blends do not conform with the suggested "rule." It is 
interesting that similar violations of rules were observed in mappings obtained with the K-L 

and auto-associative methods. 

The sensitivity of the location of a point in the two-dimensional space to 
changes in the original five-dimensional space can be explored readily and some indications of 
that are depicted in Fig. 7. Such information provides guidance on how other improved 
blends might be formulated in various different composition regions. 

The present NLVC dimension-reduction system is also suitably used to map 
complex time-dependent sensor output profiles into points in two dimensional space. In 
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addition, changes in the nature of the profile can be detected as motion of that point in the 

reduced-dimension space. 

In a certain industrial installation, the condition of the process was monitored by 
sensors and the time-dependent sensor output profiles can be used to provide information as to 
whether the operation would be in or near "fault" condition. In this study, the profiles from 
one sensor were reduced to patterns of five features each, as shown listed in Table 2 (Fig. 12). 
Two bodies of such data were prepared, one for "training" the interpretation mode and the 
other for testing the usefulness of the interpretation scheme. 

Using NLVC mapping, each of the profile patterns was reduced to a 2-D point 
and the entire set of training set profiles can be displayed in a single 2-D plot as shown in Fig. 
8. 

The so-called "training" action amounts to indicating what is known of each 
point in the two dimensional space, whether that point, representing a sensor output profile, 
was associated with a "fault" or "no fault" condition. 

It turns out for the data processed, the profiles are indeed representative of 
"fault" or "no fault" conditions. In the plot of Fig. 8, the points representing the two types of 
profiles do indeed separate cleanly, albeit not linearly. Given such circumstances, a rule for 
classifying a new profile as "fault" or "no fault" can be easily formulated. As shown in Fig. 
9, such a rule was well validated with points from the test set of profiles. 
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Values of four representative crystal-structure parameters are listed in Table 3 
(Fig. 13) for a number of semi-conductor materials. Listed also are values of the "band-gap" 
in the electronic band strucrure of such materials. 

NLVC mapping of the four-feature crystal-structure patterns yielded the map 
shown in Fig. 9. The low band-gap material seem to he towards the upper left portion of the 
map and study of that distribution might give some hint as to what combinations of crystal 
structure might be associated with low band-gap. 

The subject system is disclosed with particular emphasis on two-dimensional 
displays as they are especially easy to comprehend. Three dimensional displays are suitably 
accommodated by humans as well. But all higher dimensional displays are opaque to 

visualization and to "understanding." 

This new method is extremely efficient computationally. Experimental results 

indicate that it is "topological correct" in some powerful and attractive manner. 

The subject system seeks to conserve all the original variance while carrying out 
a nonlinear mapping to a reduced-dimension space. Maps obtained in the foregoing manner 
are suitably used for a variety of tasks, and can even be used as a visual associative memory, 
suitable for storing similar descriptions of objects and of time histories of the evolution of 
objects in associated manner, so that the mapping of a new object into a region of the memory 
would give hints as to what other matters one should be reminded of. 

In variance-based approaches, the objective is to find a reduced-dimension 
mapping of the data, for which much of the variance of the data is retained, and for which the 
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components of the data pattern vectors in the new representation are uncorrected as much as 
possible. 

It is seen that this approach yields results similar to that of the feature map 
method, in an investing manner. It happens that patterns which have similar research octane 
ratings are mapped automatically into contiguous regions in the 2-D reduced dimension 
mapping. There is no thought of clusters. Instead a rather general category identification rule 
can easily be formulated. However the reduced-dimension map does provide guidance 
towards the formulation of improved blends. 

Application of the method to complex sensor data indicate once again that 
patterns representing fault conditions are located in clearly self-organised regions of the 2-D 
maps, distinct from the patterns representing "no-fault." 

]n the cases mentioned, the category or property value must have been 
associated strongly with the pattern descriptions. The reduced-dimension mapping merely 
makes that circumstance more obvious and more easily visualized. In yet another case, this 
same approach was applied to a sparse body of data, sparse in the sense of not having many 
exemplars but also sparse in the sense that many feature values were missing so that in fact 
only a small subset of features were available for this exercise. The data were for a body of 
crystal structure parameters for semiconductors and there was interest in seeing whether ~ 
certain regions of crystal structure "space" was associated with low band-gaps. The reduced 
2-D map did give hints as to what regions might be fruitful for further exploration. 
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The second aspect of the presen. invention. Equalized Orthogonal Mapping 
(EO M). will now be described wt.h reference to Figs. .4 - 22. The tnten, of .he EOM is .0 
discover and display .he inter-pattem rda.ionship between the data patterns, with .he mapping 
preserving .he .opology of the data as much as possible. This ,s achieved through cons.raining 
the values of the elements of the cov.ri.nce matrix of the output during the learning process. A. 
t „e end of .he training, the covariance matrix of the output is reduced to the form of a constan. 
times the identity matrix. This guarantees .ha, the reduced dimensions are equally imponan. and 

mutually orthogonal. 

The EOM approach can be achieved with a conventional three-layer feedforward 

net N with one hidden layer, as shown in Fig. 14. Net N shows the network structure for 
equalized orthogonal mapping, which include, an input layer, hidden layer and output layer. The 
lines between the nodes are indicative of "links" between nodes of adjacent layers. As noted 
above, a "hidden layer" can be a non-linear functional transformation layer, such as practiced in 
the functional link and radial basis architectures. 

Net N is trained using the backpropagation algorithm. In the beginning, the 
weights of the net are generated randomly in the interval [-W. W]. These weights are adjusted 

iteratively through the learning process. 

Let {x p } , p = 1 , 2, . . . , P, be a set of input data pattern vectors of 5 dimensions. 
The averaged variance of all dimensions of these data pattern vectors is given by: 

SP / = ! p = l 

- 27 - 



(Equation 5) 

where "< >" <Jeno.es .he average o, mean taken over all of the input data patlem vectors for each 
indicated component (i.e.. < x,> denotes the mean vatue of of », eva.uated over the set of 
data pattern vectors), and x, p is the ith component of x, the pth member of a se, of data pattern 



vectors. 



To keep the generality of the discussion, assume there are K dimensions in the 
reduced-dimension representation. The covariance matrix of the outputs is thus a K x K matrix. 
Each element of the covariance matnx of the outputs (i.e., output signals) can be written as: 



1 ' 



v =±.Y(o -<o t >)(o.-<o k >) 



(Equation 6) 



.where: 

P =i,2,...,P; 

O is the output sienal of the k.th node of the output layer for the pth input data 

k lP 

pattern vector; 

O is the output signal of the k 2 th node of the output layer for the pth input data 

pattern vector; 

<0 > is the average of O k evaluated over the set of input data pattern vectors 
<O k > is the average of O kj/t evaluated over the set of input data panern vectors 

k, = ltoK; 

k 2 =ltoK; 
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K is the number of dimensions in the reduced-dimension representation; and 

< > denotes the mean evaluated over the set of input data pattern vectors for each 

indicated component. 

Due to the symmetry of the covariance matrix, only the terms in the upper tnangle 

of the matrix need to be considered. The objective is to minimize the error E given by: 



K K 



(Equation 7) 



where E. k is g 
diagonal or not. 



iven by the following depending on whether the element is on the main 
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(Equation 8) 
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It should be 



understood that r kk is a positive constant, which has an effect of increasing the speed 

which has an effect of increasing the speed of training. 



of training, and r uk2 is a positive constant 



Moreover 



, it should be appreciated that by minimizing the above error function, the covariance 
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matrix of the output will end up in the desired form of a constant times a diagonal matrix, with a 
constant times the identity matrix being a practical option. 

The constant, V oulM , is targeted to be proportional to the averaged input variance 
V, n . The constant r in Equation (8) is the relaxation factor which is usually less than unity. It is 
introduced to speed up the training further by reducing the output variance of each dimension. 
Since variance which is mostly comprised of information from quadratic terms also resembles 
the energy of a system, reducing the variance thus corresponding to relaxing the energy 
requirement for each dimension. This reduces the number of iterations for the net to achieve the 
desired error tolerance. Since variance also captures the inter-pattem relationship of the data, by 
making the variance of each output dimension proportional to the input variance, this method 
seeks to preserve as much relative position information as possible. The denominator is 
introduced for normalization purpose so that the error target specified will be independent of the 

value of the input variance. 

The expressions for updating the weights iteratively can be obtained upon taking 
derivatives of the error E with respect to them. For the weights between the Ath and,th layer, 
using sigmoidal neurons for both the hidden (/th) layer and the output (*th) layer, this is given 

by: 



* dw kj 
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(Equation 9) 

where ^ is -he con.ribu.ion from .he diagona, ,erm, i. .he conmbu.ion from .he off- 
diagona. «. in Anh row and is .he comribunon from .he off-diagonal .erms in «h 
column. The expressions of .hese .hree .errns are as follows: 



k dE.. i ^ o 



(Equation 10) 



(Equation 11) 



(Equation 12) 



where 6„ is a value proportion, ,0 the contribution ,0 the error E by the outputs of the H. 
node of .he output layer, for the pth input data pattern vector, and 6,,, ,<V. , and 5,,., are 
components of b n . 6 V , 6 VJ and 6 W (for sigmoidal functions) are gtven by: 
6„., - 4(K o „,„- r K i „)(<O i >-0 Jp )O tp (l-O v ) 

(Equation 13) 



kp,2 



(Equation 14) 




(Equation 15) 



, where O, is the output signal from the k.h node in the output layer for the pth input data 
patten, vector, <0«> is the average of O n evaluated over the se, of input data patten, vectors, 
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and O, is .he ou.pu, signs, fern, ,he i,h node in ,he iayer preceeding ,he ou.pu. layer for .he p.h 

input data pattern vector. 

To simplify the notation, combine Equations (13), (14) and (15) and denote: 

V s Vi + 6 ^ +6 ^.3 (Equation 16) 

Equation (7) can then be rewritten in the more familiar generalized delta rule form, 



(Equation 17) 

Further backpropagation of error to the weights between the,th and ,th layer remains the 
same as in a conventional net, the expressions are: 



dE 1 P 



J dw.. ip = l 
7' y 



(Equation 18) 

where, 6^ is given by: 
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(Equation 19) 

The EOM approach is evolved from the NLVC mapping approach. In NLVC 
mapping, the objective of learning is to conserve the total variance in the description of data 
patterns, that i,. the weights of the net are learned such that the difference between the total 

t variance of the outputs and the total variance of the inputs i, within a certain prescribed limit, 

Jl that is, the error function for NLVC is simply: 

m , 

*D E=-(V -V.) 

V out in' 

M 4 



(Equation 20) 



where K oul is given by: 

v„-— ttw tp -<o k >) 2 

out D is * . ; r 
rA p = \ * = ] 

(Equation 21) 
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and K in is same as Equation (5). Using the exact the same net structure and backpropagat.cn 
learning algorithm, the parameter 6„ is given by: 

(Equation 22) 

The formulae for iterative* updating the network weights are of the same forms as Equations 
(17) to (19). 

The NLVC approach is computationally highly efficient and the resulting 
reduced-dimension maps have been shown to be useful in applications such as classification, 

categorization, monitoring and optimization. 

One reason for the high efficiency of the NLVC approach is that the variance 
conservation constraint is rather loose. Indeed, the constraint given in Equation (22) is even 
weakerVhan that in Equation (13) alone." But this also has side effects. Given a different set of 
randomly generated initial weights for the net through a different random number seed, the 
resulting map for the same input looks quite different and the distnbution of the points on the 
map is often uneven with strong correlation between the reduced dimensions. Though it is still 
possible to gain some qualitative information even from maps with uneven distribution of points, 
it is usually necessary to make some trials to get a map with good distribution of points. 

A good way to reduce the correlation between the reduced dimensions is to 
orthogonalize the outputs of the net during the learning of the mapping. To achieve this goal, a 
natural point to start is to constrain the values of the elements of the covariance matrix of the 
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outputs of UK n«. Ifal! offdiagona. entries vanish, the omputs are orthogonal ,0 each other. 
W,,h aU -he off-diagona, terms reduced .0 zero, it is also easy to -« a.l the reduced dimens.ons 
e q ua.,y dominant by setting a., the elements on the main diagona. ofthe covariance matrix to 
equa , va.ues. This forces the covanance matrix to have eaua, eigenvalues and the variance of 
each reduced dimension to be the same. To preserve as much topological information as possible 
during the mapping, each element of the main diagona, can be assigned to a value related to the 
average of the input vanances for all the input dimensions through a relaxation factor. This is 

exactly what the EOM approach does. 

Compared with the NLVC approach, this current approach puts a much stronger 
constrain, on the learning procedure. Ye, the sacriftce of efficiency is rather sma... For example, 
when the reduced dimension is 2-D, ,.e. K - 2 which is most usefitl for visual display, the 
covariance matrix of the outputs is a 2 x 2 matnx, and there is only one off-diagona. term which 
needs to be computed compared with two diagona. terms which have to be computed by both 
approaches. For each i.era.ion of.raining, this only introduces a roughly 50% overhead in the 
computing of musing EOM compared to using NLVC. The computatton for .w, is the same 
for both approaches. 

Examples of use of EOM with comparisons to that of SOM and NLVC wi.l now 
be described with reference ,o Figs. 1 5 - 22. For EOM and NLVC, sigmoida, neurons were used 
,„ both hidden and output layers. In order to visualize the resuMng maps, the number of output 
neurons were chosen to be 2. Since for real multidimensional data, the inherent dimensions may 
no, correspond to any simple physical quanti.y, and only ,he relative positions ofthe data points 
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are of interes, .he absolute scales of the two reduced dimensions are of no importance. Thus, .he 
outputs were linearly mapped in.o an .mage of 5 1 2 x 5 1 2 p.xels and no labeling is done for ,he 

two axes. 

The following simple 5-D function with 2 inherent dimensions is used as a first 
test since the theoretical solution is known. 

z =sin(^(x 1 +x 2 +x 3 ))cos(2 7i(x 4 +* 5 )) 

(Equation 23) 

,n Equation (23), ,he five vanables are no. all independen. bu, are related in .he foUowing 

manner: 

x, = t„x 2 =2U- 1,jc 3 = 1 - t„x A = t 2 ,x 5 = \-2t 2 

where and h represents the 2 inherent dtmensions of the function and are inside the interval [0, 
1]. One hundred data patterns were generated randomly in the given range and served as the raw 
data set. 

Using ,„ and h as .he two axes, an analytically generated 2-D map is shown in 
Fig. 15. The gray .eve, shown inside each label square reflects .he . value of .he corresponding 
pair of (,„ „). The range of .he z values of .hese data points is linearly mapped .0 256 gray levels 
wilh whi.e representing minimum and black maximum. 

Figs. 16- 1 S show the mapping results of SOM, .he NL.VC approach and the 
EOM approach. The four maps are obtained with the same four random number seeds. For 
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SOM, a 20X2O grid was used and ,he Gaussian function was used as .he neighborhood (lateral 
excitation) function. Dunng the learning of the mapping, the leaming-rate factor a(r) linearly 
decreases from 0.9 to 0 and the width of the neighborhood kernel o(,) linearly decreases from 
one half of the length of the side of the map to one unit length between grid points. 

For NLVC and EOM, the number of hidden neurons was 15. The initial network 
weigh, parameters are identical for these two methods. For EOM maps, the relaxation factor r 

was chosen to be 0.1. 

Figs. 1 6A, 1 6B, 1 6C and 1 6D show reduced dimension maps for the function 

shown in Equation (23) obtained by SOM, where seed=7; seed=8; seed=4; and seed=3, 
respectively. It should be understood that the "seed" is a parameter used for generating initial 
reference vectors. 

Figs. 17 A, 17B, 17C and 17D show reduced dimension maps for function shown 
in Equation (23) obtained by NLVC approach, where seed=7; seed=8; seed=4; and seed=3, 
respectively. It should be understood that the "seed" is a parameter used for generating initial 
network weights. 

Figs. 18A, 18B, 18C and 18D show reduced dimension maps for function shown 
in Equation (23) obtained by EOM approach, where seed=7; seed=8; seed=4; and seed=3, 
respectively. It should be understood that the "seed" is a parameter used for generating initial 

network weights. 

It should be appreciated that the dark points and light points tend to separate in the 
maps obtained by SOM, but they hardly look like the theoretical map and neither do they look 
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aHKe ,o each o.he, Since ,he da,a poin.s are res.ric.ed ,0 .he grid poin.s on .he map. informa„o„ 
„„ ,he f.ne ,e,a,ive positions of .he da, poin.s as shown on ,he analytical* genera.ed map is 
los ,. D.fferen. random number seeds also cause .he resulting maps .o ,oo k differen, However, 
SOM maps usual* give good coverage of .he map area ahhough .ha, is res.ric.ed ,o gn d poin.s. 

The apparen. differences in .he four maps ob.ained with .he NLVC approach 
sh ow ,ha, .here is a s.rong dependency benveen me choice of ,he initial we.gh,s and ,he resu,, of 
,he mappmg. However, upon Cose comparison between each of ,he four maps and ,he 
analytically genera.ed one, i, can be seen ,ha, despi.e .he apparen. differences of .he four maps, 
,„ev a!l can be .ransformed from .he analyricaliy genera.ed map by some combina.ions of 
ro .a.ion, renecion and compression. Tha, is .o say, ahhough ,he dis.ribu.ion of .he da.a P oin,s 
as a whole is dis.or.ed .o differen. degrees in .hese maps as compared ,o .he analytically 
g enera.ed one, .he rela.ive posi.ions of .he da,a poin.s seem ,o be preserved in .hem. In o.her 
words, .he .opology of .he da.a poin.s seems .o be preserved in ,hese maps in some .oca, manner. 
However, .he diagonal bel, shape of dis.ribu.ion exhibi.ed in ,he maps of Figs. 17B, 17C, 17D. 
and .o a lesser degree 17 A, ,m P l,es s.rong correla.ion be,ween ,he nvo reduced dimensions. As a 
conseouence, .hese maps fail ,o utilize .he Ml capaciry of ,he reduced d.mens.ons. Though ,hese 
maps are .opologically correc. in .errns of .he relative posi.ions of .he da.a pom.s, ,he map of 
Fig. 1 7D is effectively useless and maps of Figs. 1 7B and PC may only be used for some 
q uah,a,ive descriptions of .he da.a poin.s. Only the map of Fig. 17A shows a relatively good 
dis.nbu.ion of da,a poin.s which can be used in ,uan.i.a,ive tasks such as optimization. These 
four maps serve as a good example ,o illus.ra.e .he need for a better mapping approach which no. 
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only keeps the map topological* correct but also reduces the randomness of the mapping and 

fully utilizes the reduced dimensions. 

The four maps obtained with the EOM approach on the other hand show 
remarkable resemblance to each other and to the analytically generated one. Apart from rotation, 
reflection and the difference in scaling which was done automatically to fit the image, all four 
maps are essentially identical to the analytically generated one. This shows the robustness of the 
EOM approach in handling different initial conditions. One small detail to note is that the angle 
of rotation in these maps is either around 45° or 0°. Since theoretically the distribution of the 
data points forms a square region, and a square at those two angles makes the two dimensions 
equally dominant, this observation is reassurance that the EOM approach does achieve the goal 
to make full utilization of the reduced dimensions. 

As for the computational efficiency, the case of 7 being used as the random 
number seed is used as an example for an empirical companson. The EOM approach took less™ 
than 6 seconds to converge in 1 78 iterations. The NLVC approach took less than 2 seconds to 
converge in 1 2 iterations and the SOM approach took 1 1 7 seconds for 1 00 iterations. The 
efficiency improvement over SOM is significant. Although EOM takes longer than an individual 
NLVC run, it may still end up as a winner if a satisfactory map is not found in the first few trials 
of NLVC runs. 

There is body of gasoline blending data in the literature, a subset of which 
contains those with all attributes known are given in the table of Fig. 1 1 . This set of data have 
been shown to "self-organize" into two almost distinct regions for patterns with octane ratings 
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higher than 100 and for those below 100 upon dimension reduction to 2-D using both 
autoassociative approach and the NLVC approach. 

Fi^s. 19A - 19D show reduced dimension maps for the gasoline blending data 
shown in the Table of Fig. 11. obtained by SOM, with seed=7; seed=8; seed=4; and seed=3, 
respectively. Figs. 20A - 20D show reduced dimension maps for gasoline blending data shown 
in the table of Fig. 1 1 obtained by NLVC approach, with seed=7; seed=8; seed=4; and seed=3, 
respectively. Figs. 21 A - 21 D show reduced dimension maps for gasoline blending data shown 
in the table of Fig. 1 1 obtained by the EOM approach, with seed=7; seed=8; seed=4; and seed=3, 
respectively. Figs. 22A and 22B shows a reduced dimension map based on the six gasoline 
blending data patterns of high octane ratings. Fig. 22A illustrates a map of the six patterns 
obtained by EOM approach, while Fig. 22B illustrates model values of this region. 

For SOM, a 10 x 10 grid was used and the choice a(t) and o(t) were same as 
above. For NLVC and EOM, the exact same net architectures were used to obtain the reduced 
dimension maps. Even the same random number seeds were used. Figs. 19 to 21 show mapping 
results of SOM, the NLVC approach and the EOM approach. The gray level inside each label 
square reflects the octane rating of that pattern with lighter ones corresponding to higher octane 
ratings. Since there are only 26 patterns, pattern numbers are also shown. 

Once again, the SOM maps show separation of data points based on octane ratings 
to some degree with the map of Fig. 19A providing the best result. The dependency on initial 
parameters is again evident since the maps are quite different. 
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The NLVC maps again show the belt shape distributes ol data points inaicaung 
the presence of correlation. But nevertheless all four maps show reasonable coverage of the map 
area to make them useful, at least for qualitative discussions. By examining the gray levels of 
,he label squares, it can be seen that all four maps show some degree of separation between the 
light-shade points and the dark-shade ones with the map of Fig. 20B providing the best result. 
This agrees to previous NLVC results using different maps. 

The EOM maps as expected show better coverage of the map area. The 
separation of high-octane data points and those of low-octane are even more evident on these 
maps. However, in maps obtained using both NLVC and EOM approaches, it is also evident that 
relative positions of data points in these four maps are not kept the same as in the case of the 
mathematical example, especially for those points which are close to each other as shown in the 
m aps, with the EOM maps showing less variation than the NLVC maps. This however, does not 
mean that these maps failed to preserve the topology of the data set, but rather shows that the 
inherent dimension of the data set is actually higher than 2. Since it is not possible to show all 
the topology information in one 2-dimensional map for a data set of higher inherent dimension, 
different maps just represent projections from different "angles". This is analogous to the blue 
print of a part which needs projections from three sides to show the topology of that part. 
Though the projection process is a non-linear one for these dimension reduction maps. 

Since the mathematical example demonstrated that the EOM approach essentially 
maintains map invariability for data with two inherent dimensions, any change except rotation 
and reflection in maps obtained from different initial weights is an indication that the inherent 
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d.mension of the data set is higher than then the dimension of the map. However, even witn 
some changes evident, it does not necessarily render the resulting maps useless if these variations 
are not completely dominant. Much information can still be gathered since the different inherent 
dimensions of the data set may not be of equal importance. This is exactly the case for the 
gasoline blending data. Comparing the four maps obtained by the EOM approach, it can be seen 
that Patterns 2, 3, 13, 14, 16 and 22 which are of high octane ratings form a distinct group in all 
four maps. Furthermore, all the above patterns except Pattern 13 show up at least once on the 
edge of a map. This may indicate that Pattern 13, which gives the highest octane rating so far, is 
surrounded by the five high octane patterns, and that blends with even higher octane ratings 
might be found in this region. This is less evident on NLVC maps due to distortions from initial 

network parameters. 

These six patterns were isolated from the rest of the set and an EOM map was 
generated for them. This is shown in Fig. 22A. Due to independent mappings of intensity to 
octane ranges, the exact shades of the six patterns are different from those in Fig. 21. The map 
indeed shows that Pattern 13 is surrounded by the other five patterns. 

A model of 2-dimensional function was learned using the random vector version 
of the functional-link net. This model was used to predict the octane ratings in the region shown 
in Fig. 22A. The result is given in Fig. 22B. This figure shows that a point of even higher 
octane rating is located at: 

d, = S5.5\,d 2 = 173.5. 

The corresponding expected octane rating is: 
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z=102.4. 

The {d„ d 2 ) value can be mapped back into the original 5-D space with another random 
vector functional-link net. The results are: 

Xl = 0.226, x 2 = 0.096, x 3 = 0.058, x 4 = 0.022, x 5 = 0.599. 

5 It should be noted that due to the limited number of patterns available in constructing the 

network models, the above results should be considered more in terms of providing guidance in 
future formulations than that of giving accurate prediction. 

The present invention provides a new and unique approach to obtain topologically 
9 correct reduced dimension maps which can help visualize multidimensional data patterns. This 
I fJS approach is demonstrated to be able to reduce the randomness in the resulting maps due to the 
% difference in the choice of initial network weights as is evident in other approaches of similar 
'5 purposes. In addition, this approach can easily show whether the original data set can be 
!?, described satisfactorily using the reduced dimension map by choosing different initial weights. 
!n The maps obtained by this approach fully utilize the map area and can be used to substitute maps 
15+ obtained using other approaches of similar purposes in various applications. 

The invention has been described with reference to the preferred embodiment. 
Obviously, modifications and alterations will occur to others upon a reading and understanding 
of this specification. It is intended to include all such modifications and alterations insofar as 
they come within the scope of the appended claims or the equivalents thereof. 
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